Simulating null distributions

Beyond only sampling error

Elizabeth King
Kevin Middleton

Null hypothesis

  • The baseline expectation
  • In statistics, often this is our expectation when only sampling and measurement error are causing variation
  • This is the default hypothesis: we require evidence against it to reject it in favor of an alternative
  • Hypotheses are never proven true
    • Null is rejected or failed to be rejected

Null Distribution

  • We can evaluate evidence in the context of the null hypothesis if we have a null distribution for some parameter of interest
  • How to get the null distribution
    • Empirically
    • Simulation
    • From analytical solutions (mathematical formulas)

When the Null Hypothesis is not only Sampling Error

  • Detecting selection
    • Null: genetic drift is driving differences
  • Similarities and differences between species
    • Null: common ancestry is causing similarity
  • Patterns of species diveristy
    • Null: random extinction, speciation, & dispersal events drive patterns

Simulated Null Distribution

Identifying Random Processes

  • Define the experimental question
  • What baseline random processes could produce a similar result?

A Simple Example: Drift at One Variant in a Population

set.seed(8487264)

snpF <- 0.41

NN <- 100

pop <- tibble("C1" = rbinom(NN, 1, snpF),
              "C2" = rbinom(NN, 1, snpF))

newF <- mean(c(pop$C1, pop$C2))

newF
[1] 0.445

A Simple Example: Drift at One Variant in a Population

ngen <- 10

npops <- 10

snpG <- rep(snpF, npops)

output <- matrix(NA, ngen+1, npops)
output[1,] <- snpG

for(gg in 1:ngen){
  snpG <- sapply(snpG, function(x) mean(rbinom(NN*2,1,x)))
  output[(gg+1),] <- snpG
}

A Simple Example: Drift at One Variant in a Population

A Simple Example: Drift at One Variant in a Population

ngen <- 10

npops <- 1000

snpG <- rep(snpF, npops)

output <- matrix(NA, ngen+1, npops)
output[1,] <- snpG

for(gg in 1:ngen){
  snpG <- sapply(snpG, function(x) mean(rbinom(NN*2,1,x)))
  output[(gg+1),] <- snpG
}

allD <- abs(apply(combn(output[(ngen + 1),],2), 2, diff))

allD |>
  tibble() |>
  ggplot(aes(allD)) +
  geom_histogram(fill = "grey75") +
  xlab("Allele Frequency Difference")

A Simple Example: Drift at One Variant in a Population

Comparison to Sampling Error Only

Adding Complexity

  • Random differences in reproductive success
  • Replicate populations
  • Many loci
  • Other evolutionary models